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Alzheimer’s disease (AD) is a gradually progressing neurodegenerative 
irreversible disorder. Mild cognitive impairment convertible (MCIc) is the 
clinical forerunner of AD. Precise diagnosis of MClIc is essential for 
effective treatments to reduce the progressing rate of the disease. The other 
cognitive states included in this study are mild cognitive impairment 
non-convertible (MCInc) and cognitively normal (CN). MCInc is a stage in 
which aged people suffer from memory problems, but the stage will not lead 
to AD. The classification between MCIc and MCInc is crucial for the early 
detection of AD. In this work, an algorithm is proposed which concatenates 
the output layers of Xception, InceptionV3, and MobileNet pre-trained 
models. The algorithm is tested on the baseline T1l-weighted structural 
magnetic resonance imaging (MRI) images obtained from Alzheimer’s 
disease neuroimaging initiative database. The proposed algorithm provided 
multi-class classification accuracy of 85%. Also, the proposed algorithm 
gave an accuracy of 85% for classifying MCIc vs MCInc, an accuracy of 
94% for classifying AD vs CN, and an accuracy of 92% for classifying 
MCIc vs CN. The results exhibit that the proposed algorithm outruns other 
state-of-the-art methods for the multi-class classification and classification 
between MCIc and MCInc. 
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1. INTRODUCTION 


Alzheimer’s disease (AD) is a brain shrinkage disorder with the prime marks of memory loss. The AD 
will progressively worse over the years. It will affect the daily activities of a human being and slowly end to 
death. The four cognitive states of the human brain are cognitive normal (CN), mild cognitive impairment 
convertible (MCIc), mild cognitive impairment non-convertible (MCInc), and AD. AD can not be cured 
completely, but the shrinking rate can be reduced if it is detected at the early stage MCIc. 

The conventional clinical examinations with different imaging modalities fail to detect AD, at its 
early stage MCIc. Advanced image processing techniques have to be applied to distinguish the MCIc from 
MCInc and AD. The state-of-the-art methods observe various image processing techniques for the early 
diagnosis of AD. The methods include diagnosis using hand-crafted features and deep learning models. Deep 
learning models outperform most of the methods supported by hand-crafted features. Difficulty to process the 
high-dimensional hand-crafted features makes these methods inferior to deep learning models. The rest of the 
section discusses the significant deep learning algorithms for the early detection of AD. 
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Researchers put forward many significant works for the early detection of AD using deep learning 
algorithms. The latent feature representation using a stacked encoder is proposed by Suk and Shen [1]. 
The algorithm considers low-level features like gray matter tissue volumes that improve diagnostic accuracy for 
early detection of AD. An algorithm based on a 3D convolutional neural network and sparse autoencoder is 
proposed by Payan and Montana [2]. The algorithm improves the three-class (AD vs CN vs MCI) classification 
accuracy. In [3], 2D convolutional neural networks and sparse autoencoder are used to classify AD, CN, and 
MCI. Compared to the algorithm proposed by Payan and Montana [2], Gupta et al. [3] is simple as it uses 2D 
convolution. But 3D spatial information is not exploited by Gupta et al. [3] while Payan and Montana [2] is 
utilized the 3D spatial information of magnetic resonance imaging (MRI) images. The algorithm proposed by 
Valliani and Soni [4] claimed that non-biomedical pre-trained models like ResNet [5] learn cross-domain 
features that enable the model to extract significant low-level features from MRI images to improve the 
classification accuracy. The proposed algorithm ensures the efficiency of data augmentation before learning. 

The algorithm proposed by McCrackin [6] generates 3D multi-channel feature maps based on 
Voxception-Resnet for the classification between AD and CN. Data augmentation is performed before 
generating feature maps. The algorithm is implemented on diffusion magnetic resonance imaging (MRD) 
images. As mentioned in [4], [7] is also used a non-biomedical pre-trained model visual geometry group 
(VGG-16) to learn the cross-domain features to increase the accuracy. The algorithm proposes a 
mathematical model based on transfer learning with VGG-16 and achieves remarkable three-class 
classification accuracy. The ensemble-based algorithm proposed by Pan et al. [8] combines the features from 
sagittal, coronal, and transverse slices of MRI images. Data augmentation is performed to avoid over-fitting. 
Two-stage ensemble learning is implemented in this algorithm. In the first stage, three base classifiers 
ensemble sagittal, coronal, and transverse slices separately. Then in the second stage another base classifier 
ensembles three-axis slices. Each base classifier consists of six convolution layers. The outputs from multiple 
base classifiers are combined to improve the classification accuracy. 

The algorithm proposed by Islam and Zhang [9] is an ensemble of three slightly different deep 
convolutional neural networks. The individual model has four following basic operational layers 1) convolution, 
2) batch normalization, 3) rectified linear unit, and 4) pooling. The model focuses on four-class classification 
while the majority of the works focused on either binary classification or three-class classification. Here, also 
data augmentation is performed to expand the dataset. In [10] end-to-end learning of a CNN-based model has 
been implemented for three-class classification. The features can be naturally learned from basic data without 
any specialist control. In this work, the input data is transformed into a lower dimension space using a 
convolutional autoencoder. In [11] a convolutional neural network integrates the features from MRI and 
positron emission tomography (PET) images of the hippocampal area for the detection of AD. Here the 
hippocampal area is selected based on the region of interest (ROI). Since the different modalities are combined, 
the proposed algorithm provides decent results for the classification of AD vs CN, MCIc vs CN, MCIc vs MCInc. 

The algorithm proposed by Sun et al. [12] is an efficient dual-functional 3D convolutional neural 
network for three-class classification and an accurate bilateral hippocampus segmentation. Accurate 
hippocampus segmentation is advantageous to increase classification accuracy. The algorithm uses V-Net 
convolutional blocks with bottleneck architecture to reduce the scaling while maintaining the segmentation 
accuracy. The review by Al-Shoukry et al. [13] has been listed and analyzed the recent works in the field of 
early detection of AD using deep learning algorithms. The work points out the fact that prediction of AD at the 
early stage deserves much more attention than the diagnosis of AD. The algorithm proposed by Ju et al. [14] 
works with functional MRI images along with medical information including age, gender, and genetic 
information. A stacked autoencoder has been used to train the deep neural network based on functional MRI 
time-series data or correlation coefficient data. Wen et al. [15] has reviewed numerous algorithms based on 
convolutional neural network (CNN) and MRI in the field of early detection of AD. Also, the algorithm 
proposes an open framework for reproducible evaluation. In [16], a 3D local directional pattern is implemented 
which computes the orientation around each voxel. The algorithm shows less sensitivity to illumination and noise. 

Shao et al [17], multi-kernel support vector machines and hyper graph-based regularisation were 
utilised to combine shared features from many modalities. According to the findings, the method provides 
classification accuracy that is higher than that of previous multi-modality techniques. The algorithm’s 
primary flaw is that all hyperedge weights are set to 1 without considering various hyperedges. In [18], 
support vector machine classifier and wavelets, as well as the Gabor filter and Gaussian of local descriptors, 
are employed as tools for feature extraction. Three separate support vector machines (SVM), each trained 
with a different feature descriptor, are combined in the system. In [19], clinical and texture characteristics are 
used to identify the transition stage of MCIc. The key benefit is that MCI and AD have been classified using the 
entire brain’s MRI texture. An approach to feature selection that makes use of a multivariate general linear model 
is suggested in [20]. The modest intensity fluctuations from CN to MCIc are produced with the use of a general 
linear model. Additionally, multivariate adaptive regression splines, a unique classifier, are utilised as a classifier. 
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High classification accuracy was achieved in [21] by using texture features that were derived from the 
elliptical neighbourhood, however at a significant computational expense. 

Maguolo et al [22], different activation functions used in convolutional neural networks for medical 
applications are compared. In [23] and [24], the efficiency of assembling pre-trained networks for medical 
applications is demonstrated. The algorithm used by Liu et al. [25] is based on deep CNN for learning both 
features of hippocampus segmentation and features of classification using 3D DenseNet. Many of the papers 
listed don’t specifically address the distinction between MCIc and MCInc [26]. Accuracy in the works that 
have undergone the aforementioned classification has not exceeded 70%. Multi-class classification and MCIc 
vs MCInc classification are the main focus of this work. 


2. METHOD 

In this work, a deep learning algorithm for the multi-classification of AD is presented. The proposed 
algorithm is based on the ensemble of pre-trained models. The block diagram of the proposed system is 
shown in Figure 1. It consists of four parts: 1) separating the middle slice from MRI image, 2) normalization, 
3) augmentation, and 4) ensemble model. 

In this work, data are taken from the Alzheimer’s disease neuroimaging initiative (ADNI) database. 
According to the ADNI central database acquisition protocol, a three-dimensional sagittal Tl-weighted image 
sequence with 1.2 mm slice thickness in 1.5 T field strength is acquired. The relative age group and the number 
of samples of each category used in this study are given in Table 1. There are 54 cognitively normal subjects, 
52 mild cognitive impairment non-convertible subjects, 58 mild cognitive impairment convertible subjects, 
and 72 Alzheimer’s disease subjects. 


Table 1. Description of MRI images used in this study 


Category _ Numbers Age 
CN 54 74.12 + 3.48 
MClInc 52 75.36 + 2.58 
MCIīc 58 76.89 + 3.65 
AD 72 75.89 + 3.68 


Middle Slice Ensemble 
of 3D Image Model 


Normalization Augmentation 


Figure 1. Workflow of the proposed model 


Initially, the MRI database is modified with middle slices of 3D MRI images. The middle slice 
contains significant data while all other slices may carry redundant information. The separated middle slice 
of images is undergone through normalization operation. The slice of the MRI image is composed of pixels 
with a value between 0 and 255. Normalization downscales the array of the original image pixel values to be 
between [0, 1] which makes the images contribute more equally to the overall loss. Otherwise, a higher pixel 
range image results in greater loss and a lower learning rate should be used, a lower pixel range image would 
require a higher learning rate. 

Data augmentation is performed to ensure the larger availability of training, testing, and validation 
images to avoid overfitting. Image augmentation enlarges the size of the dataset by building a revised version 
of the existing dataset images that provides a large amount of dataset variation and finally increases the 
capacity of the model to predict new images without any error. Data augmentation consists of four operations 
for each image. Flipping left to right, flipping up and down, rotation, and insertion of randomized noise are 
the various operation which has been done to get the augmented images. 

The samples of augmented images are given in Figure 2. The augmented images are classified using 
the proposed ensemble model. The ensemble model is made up of three pre-trained networks. The pre-trained 
networks used to build the ensemble model are Xception, InceptionV3, and MobileNet. Detailed network 
architecture is given in Figure 3. The three pre-trained models are trained individually. The outputs of all 
models will be taken and connected to a concatenation layer. Along with the concatenation layer, a dense 
layer with 1024 units followed by that another dense layer with a single output and activation equal to 
“sigmoid” will be added for binary classification. A dense layer with four outputs and an activation function 
“softmax” will add for multi-class classification. 
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Figure 2. MRI image augmentation (from left to right: original image, flipping left to right, flipping up and 
down, rotation with 45° and noisy image) 


Inception V3, MobileNet, and Xception networks are operated based on a separable convolution 
layer [27], [28]. Spatial convolution and depth-wise convolution are two types of separable convolution. 
The spatial separable convolution primarily deals with spatial dimensions of the image and kernel. A spatial 
separable convolution divides a kernel into two, smaller kernels. This results in a reduction in the number of 
multiplications and thus system complexity. The network will run faster compared to normal convolution. 
But all the kernels cannot be divided into smaller kernels uniformly. Because of this, spatial separable 
convolution is not commonly used in deep learning algorithms. The depth-wise separable convolution can 
work with the kernels which cannot be factorized uniformly. It deals with spatial dimension and depth 
dimension. Depth indicates the number of channels of the image. Each channel is a particular interpretation of 
the image. Depth-wise separable convolution divides the kernel to do depth-wise convolution and point-wise 
convolution. Depth-wise convolution is performed on the image without changing the depth of the image. 
The point-wise convolution uses a unity size kernel or a kernel that iterates through every single point. 
The kernel depth and image depth will be the same. The less computation time and different feature maps are 
the advantages of depth-wise separable convolution. The main concern about depth-wise separable convolution 
is that it reduces the number of parameters in a convolution. But the depth multiplier can be set accordingly to 
increase the number of parameters in the network to learn more about the characteristics of different images. 


Inception V3 MobileNet 


Output layer of Output layer of Output layer of 
Xcepti Inception V3 MobileNet 


Xception 
Concatenation of 
output layers 


Dense layers 


Figure 3. Architecture of ensemble model 


3. RESULTS AND DISCUSSION 

In this study, 3D brain MRI images with the size of 121x145x121 voxels are used as the input for 
the proposed model. The 3D MRI images are Neuroimaging Informatics Technology Initiative (NIFTI) 
images. The middle slice is extracted using med2image of the python library supported by Keras and 
TensorFlow. The function med2image will convert the medical images of NIFTI format into joint 
photographic experts group (JPEG) format. Due to the small dataset of images used in this study, the images 
are augmented by random nonlinear transformation, rotation, and flipping. The training images and test 
images are augmented separately. The model is trained in a workstation environment of Google Colab and 
implemented based on the deep learning toolkit Keras and TensorFlow. This model is trained from scratch 
until it converges. To achieve fast convergence, a fixed learning rate of 0.01 is set, and uses a stochastic 
gradient descent algorithm as an optimizer to update weight parameters. 10% of training images are taken as 
validation images. The cross-entropy loss function is used to update the weights. 
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The proposed algorithm is evaluated by precision, recall (sensitivity), accuracy, and F1 score. 


The formulas of the above four measures in (1), (2), (3), and (4) respectively. 


SN TP 
Precision = (1) 
TP+FP 
TP 
Recall = —— (2) 
TP+FN 
TP+TN 
Accuracy = ——————— (3) 
TP+TN+FP+FN 
2xPrecisionx Sensitivi 
F1 score = 5 (4) 


Precision+ Sensitivity 


True-postive (TP), false-positive (FP), true-negative (TN), and false-negative (FN) provide the TP 
classifications, FP classifications, TN classifications, and FN classifications. TP represents that the model predicts 
1 and the true value is 1. When a value is TN, the model predicts 0 while the true value is 0. FP means that the 
model predicts 1 and the true value is 0. FN refers to when a model predicts a value of 0 when the true value is 1. 

The receiver operating characteristics (ROC) of AD vs CN, and MCIc vs CN, MCIc vs MCInc are 
given in Figure 4, Figure 5, and Figure 6. A ROC is a graph that plots two parameters, true positive rate in (5) 
and false positive rate in (6). It shows the performance of the classification model at different classification 
thresholds. The ROC curve area indicates a two-dimensional area under the entire ROC curve from (0, 0) to 
(1, 1). The area can be a value between 0 and 1. The values O or 1 indicate that all the predictions of the 
model are wrong or correct respectively. In this work, the area under the ROC curve provides the values 0.99, 
0.98, and 0.94 for AD vs CN, MCIc vs CN, and MCIc vs MCInc respectively. 


TPR (Recall) = a (5) 
FPR = — (6) 
FP+TN 


ROC - MCIc vs CN 
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Figure 4. Receiver operating characteristics of AD 
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Figure 5. Receiver operating characteristics of MCIc 
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Figure 6. Receiver operating characteristics of MCIc vs MCInc 
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The multi-class classification performance of Xception, InceptionV3, MobileNet models, and 
proposed ensemble models are reported in Table 2, Table 3 and Table 4 respectively. The results show that 
the proposed ensemble model provides the best classification performance. Table 5 shows the accuracy of the 
binary classification of MCIc vs MCInc, AD vs CN, and MCIc vs CN. The results indicate the significance 
of this algorithm for MCIc vs MCInc classification. The multi-class classification accuracy of the proposed 
ensemble model is 85%. The multi-class classification accuracy of Xception, InceptionV3, MobileNet, and 
the proposed ensemble model is given in Table 6. The precision, recall, and F1 score of each output class of 
proposed ensemble model is given in Table 7. The training and test accuracy for 100 epochs of multi-class 
classification is given in Figure 7. 

The experimental analysis shows that the proposed algorithm has achieved good results in both binary 
and multi-class classifications. Furthermore, the proposed model is compared with state-of-the-art methods as 
given in Table 8. Multi-class classification with the ADNI dataset is addressed by very few works. The multi-class 
classification accuracy using other than ADNI is not included in the comparison. Many works addressed the 
binary classification AD vs CN. But in the context of early detection, the classification accuracy MCIc vs 
CN and MCIc vs MCInc are significant classifications. As mentioned earlier, a better MCIc vs MCInc 
classification accuracy is very promising for the early detection of AD. With the use of different layers of 
separable convolution and normal convolution, necessary information to distinguish MCIc and MCInc can be 
learned by the model. The separable convolution layers ensure the reduced computational complexity of the 
algorithm. Results indicate that the proposed model is efficient for binary and multi-class classifications. 
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Figure 7. Training and test accuracy of multi-class classification 


Table 2. Multi-class classification performance of Table 3. Multi-class classification performance of 
Xception InceptionV3 
Class Precision Recall F1 score Class Precision Recall F1 score 
AD 0.77 0.93 0.84 AD 0.85 0.91 0.88 
MCIc 0.77 0.55 0.64 MCIc 0.66 0.69 0.68 
MCInc 0.68 0.74 0.71 MClInc 0.86 0.65 0.74 
CN 0.89 0.72 0.80 CN 0.86 0.87 0.87 
Table 4. Multi-class classification performance of Table 5. Comparison of binary classification 
MobileNet accuracy (%) 
Class Precision Recall F1 score Model ADvsCN  MCIc vsCN MCIc vs MCInc 
AD 0.85 0.82 0.84 Xception 90 89 75 
MCIc 0.58 0.71 0.64 Inception V3 92 90 56 
MCInc 0.77 0.65 0.71 MobileNet 89 90 71 
CN 0.84 0.83 0.84 Ensemble 94 92 85 
Table 6. Comparison of multi-class Table 7. Multi-class classification performance of proposed 
classification accuracy (%) ensemble model 
Model AD vs CN vs MCIc vs MCInc Class Precision Recall F1 score 
Xception 78 AD 0.95 0.86 0.90 
Inception V3 81 MCIc 0.65 0.81 0.72 
MobileNet 78 MClInc 0.76 0.70 0.73 
Ensemble 85 CN 0.86 0.89 0.88 
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Table 8. Comparison of classification accuracy of proposed model and state-of-the-art methods 


Methods ADvys CN MClcys CN MCtIe vs MCInc 

[1] 95.9 - 75.8 

[4] 81.3 - - 

[8] 84 79 62 

[11] 90.10 87.46 76.90 

[17] 92.51 82.53 75.48 

[25] 88.90 - - 
Proposed method 94 92 85 


4. CONCLUSION 

In this work, an ensemble model is proposed to improve the accuracy of binary and multi-class 
classification of AD stages. Xception, Inception V3, and MobileNet models are concatenated to get the new 
ensemble model. The ensemble model succeeds to improve the classification accuracy of MCIc vs MCInc. 
Since the MCIc stage is the early stage of AD, MCIc vs MCInc is a crucial classification in the context of 
early detection of AD. To the best of our knowledge, not many works have come up with MCIc vs MCInc 
classification accuracy of more than 70%. In this work, MCIc vs MCInc classification accuracy is obtained as 
85%. While the majority of the existing research works focus on binary classification, this model provides 
significant improvement for multi-class classification also. The proposed algorithm can be very beneficial for 
the early stage of AD diagnosis. The algorithm tested for the MRI images in the ADNI database. The 
algorithm can be tested with other classification problems in medical image processing. 
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